AITopics

Country:

Europe (0.68)
North America > United States (0.28)

Genre: Research Report (0.68)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Neural Information Processing SystemsMar-17-2026, 10:03:41 GMT

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between the words receptionist and female, while maintaining desired associations such as between the words queen and female. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.

artificial intelligence, machine learning, natural language, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.96)
Information Technology > Artificial Intelligence > Machine Learning (0.63)

Neural Information Processing SystemsFeb-11-2026, 18:08:07 GMT

Appendix

H.3 WinogenderSetup We follow the same setup as in Rae et al.[38].

gopher, machine learning, natural language, (20 more...)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsNov-21-2025, 15:11:31 GMT

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

computer programmer, debiasing word embedding, name change, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.96)
Information Technology > Artificial Intelligence > Machine Learning (0.63)

AlKhalifah, Khaloud S., Mashaabi, Malak, Al-Khalifa, Hend

Gender Stereotypes in Professional Roles Among Saudis: An Analytical Study of AI-Generated Images Using Language Models

arXiv.org Artificial IntelligenceSep-29-2025

This study investigates the extent to which contemporary Text-to-Image artificial intelligence (AI) models perpetuate gender stereotypes and cultural inaccuracies when generating depictions of professionals in Saudi Arabia. We analyzed 1,006 images produced by ImageFX, DALL-E V3, and Grok for 56 diverse Saudi professions using neutral prompts. Two trained Saudi annotators evaluated each image on five dimensions: perceived gender, clothing and appearance, background and setting, activities and interactions, and age. A third senior researcher adjudicated whenever the two primary raters disagreed, yielding 10,100 individual judgements. The results reveal a strong gender imbalance, with ImageFX outputs being 85\% male, Grok 86.6\% male, and DALL-E V3 96\% male, indicating that DALL-E V3 exhibited the strongest overall gender stereotyping. This imbalance was most evident in leadership and technical roles. Moreover, cultural inaccuracies in clothing, settings, and depicted activities were frequently observed across all three models. Counter-stereotypical images often arise from cultural misinterpretations rather than genuinely progressive portrayals. We conclude that current models mirror societal biases embedded in their training data, generated by humans, offering only a limited reflection of the Saudi labour market's gender dynamics and cultural nuances. These findings underscore the urgent need for more diverse training data, fairer algorithms, and culturally sensitive evaluation frameworks to ensure equitable and authentic visual outputs.

artificial intelligence, machine learning, natural language, (18 more...)

2509.21466

Country: Asia > Middle East > Saudi Arabia (0.69)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Health & Medicine (1.00)
Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Kalhor, Ghazal, Bahrak, Behnam

Probing Gender Bias in Multilingual LLMs: A Case Study of Stereotypes in Persian

arXiv.org Artificial IntelligenceSep-25-2025

Multilingual Large Language Models (LLMs) are increasingly used worldwide, making it essential to ensure they are free from gender bias to prevent representational harm. While prior studies have examined such biases in high-resource languages, low-resource languages remain understudied. In this paper, we propose a template-based probing methodology, validated against real-world data, to uncover gender stereotypes in LLMs. As part of this framework, we introduce the Domain-Specific Gender Skew Index (DS-GSI), a metric that quantifies deviations from gender parity. We evaluate four prominent models, GPT-4o mini, DeepSeek R1, Gemini 2.0 Flash, and Qwen QwQ 32B, across four semantic domains, focusing on Persian, a low-resource language with distinct linguistic features. Our results show that all models exhibit gender stereotypes, with greater disparities in Persian than in English across all domains. Among these, sports reflect the most rigid gender biases. This study underscores the need for inclusive NLP practices and provides a framework for assessing bias in other low-resource languages.

large language model, machine learning, natural language, (16 more...)

2509.20168

Country: Asia > Middle East > Iran (0.30)

Genre: Research Report > New Finding (0.68)

Industry:

Education (1.00)
Health & Medicine (0.93)
Law > Civil Rights & Constitutional Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-10-2025

Automated Evaluation of Gender Bias Across 13 Large Multimodal Models

Contreras, Juan Manuel

Large multimodal models (LMMs) have revolutionized text-to-image generation, but they risk perpetuating the harmful social biases in their training data. Prior work has identified gender bias in these models, but methodological limitations prevented large-scale, comparable, cross-model analysis. To address this gap, we introduce the Aymara Image Fairness Evaluation, a benchmark for assessing social bias in AI-generated images. We test 13 commercially available LMMs using 75 procedurally-generated, gender-neutral prompts to generate people in stereotypically-male, stereotypically-female, and non-stereotypical professions. We then use a validated LLM-as-a-judge system to score the 965 resulting images for gender representation. Our results reveal (p < .001 for all): 1) LMMs systematically not only reproduce but actually amplify occupational gender stereotypes relative to real-world labor data, generating men in 93.0% of images for male-stereotyped professions but only 22.5% for female-stereotyped professions; 2) Models exhibit a strong default-male bias, generating men in 68.3% of the time for non-stereotyped professions; and 3) The extent of bias varies dramatically across models, with overall male representation ranging from 46.7% to 73.3%. Notably, the top-performing model de-amplified gender stereotypes and approached gender parity, achieving the highest fairness scores. This variation suggests high bias is not an inevitable outcome but a consequence of design choices. Our work provides the most comprehensive cross-model benchmark of gender bias to date and underscores the necessity of standardized, automated evaluation tools for promoting accountability and fairness in AI development.

large language model, machine learning, natural language, (19 more...)

2509.0705

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.68)

Masoudian, Shahed, Escobedo, Gustavo, Strauss, Hannah, Schedl, Markus

Investigating Gender Bias in LLM-Generated Stories via Psychological Stereotypes

arXiv.org Artificial IntelligenceAug-6-2025

As Large Language Models (LLMs) are increasingly used across different applications, concerns about their potential to amplify gender biases in various tasks are rising. Prior research has often probed gender bias using explicit gender cues as counterfactual, or studied them in sentence completion and short question answering tasks. These formats might overlook more implicit forms of bias embedded in generative behavior of longer content. In this work, we investigate gender bias in LLMs using gender stereotypes studied in psychology (e.g., aggressiveness or gossiping) in an open-ended task of narrative generation. We introduce a novel dataset called StereoBias-Stories containing short stories either unconditioned or conditioned on (one, two, or six) random attributes from 25 psychological stereotypes and three task-related story endings. We analyze how the gender contribution in the overall story changes in response to these attributes and present three key findings: (1) While models, on average, are highly biased towards male in unconditioned prompts, conditioning on attributes independent from gender stereotypes mitigates this bias. (2) Combining multiple attributes associated with the same gender stereotype intensifies model behavior, with male ones amplifying bias and female ones alleviating it. (3) Model biases align with psychological ground-truth used for categorization, and alignment strength increases with model size. Together, these insights highlight the importance of psychology-grounded evaluation of LLMs.

artificial intelligence, large language model, natural language, (15 more...)

2508.03292

Country:

Europe (0.92)
North America > United States (0.28)
Asia > Middle East > UAE (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceMar-20-2025

From Structured Prompts to Open Narratives: Measuring Gender Bias in LLMs Through Open-Ended Storytelling

Chen, Evan, Zhan, Run-Jun, Lin, Yan-Bai, Chen, Hung-Hsuan

Large Language Models (LLMs) have revolutionized natural language processing, yet concerns persist regarding their tendency to reflect or amplify social biases present in their training data. This study introduces a novel evaluation framework to uncover gender biases in LLMs, focusing on their occupational narratives. Unlike previous methods relying on structured scenarios or carefully crafted prompts, our approach leverages free-form storytelling to reveal biases embedded in the models. Systematic analyses show an overrepresentation of female characters across occupations in six widely used LLMs. Additionally, our findings reveal that LLM-generated occupational gender rankings align more closely with human stereotypes than actual labor statistics. These insights underscore the need for balanced mitigation strategies to ensure fairness while avoiding the reinforcement of new stereotypes.

large language model, machine learning, natural language, (19 more...)

2503.15904

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > New York > New York County > New York City (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(11 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine (1.00)
Government > Regional Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Mohammadi, Fatemeh, Tamborini, Marta Annamaria, Ceravolo, Paolo, Nardocci, Costanza, Maghool, Samira

Identifying Gender Stereotypes and Biases in Automated Translation from English to Italian using Similarity Networks

arXiv.org Artificial IntelligenceFeb-17-2025

This paper is a collaborative effort between Linguistics, Law, and Computer Science to evaluate stereotypes and biases in automated translation systems. We advocate gender-neutral translation as a means to promote gender inclusion and improve the objectivity of machine translation. Our approach focuses on identifying gender bias in English-to-Italian translations. First, we define gender bias following human rights law and linguistics literature. Then we proceed by identifying gender-specific terms such as she/lei and he/lui as key elements. We then evaluate the cosine similarity between these target terms and others in the dataset to reveal the model's perception of semantic relations. Using numerical features, we effectively evaluate the intensity and direction of the bias. Our findings provide tangible insights for developing and training gender-neutral translation algorithms.

artificial intelligence, natural language, translation, (15 more...)

2502.11611

Country:

Europe (0.69)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Government (1.00)
Law > International Law (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)